Automatic Construction and Refinement of a Class Hierarchy over Semi-Structured Data

نویسندگان

  • Nathalie Pernelle
  • Marie-Christine Rousset
  • Véronique Ventos
چکیده

In many applications, it becomes crucial to help users to access to a huge amount of data by clustering them in a small number of classes described at an appropriate level of abstraction. In this paper, we present an approach based on the use of two languages of description of classes for the automatic clustering of semistructured data. The rst language of classes has a high power of abstraction and guides the construction of a lattice of classes covering the whole set of the data. The second language of classes, more expressive and more precise, is the basis for the re nement of a part of the lattice that the user wants to focus on. Our approach has been implemented and experimented on real data in the setting of the GAEL project 1 which aims at building exible electronic catalogs organized as a hierarchy of classes of products. Our experiments have been conducted on real data coming from the C/Net (http://www.cnet.com) electronic catalog of computer products.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

1 Ontology Learning

Ontology Learning greatly facilitates the construction of ontologies by the ontology engineer. The notion of ontology learning that we propose here includes a number of complementary disciplines that feed on different types of unstructured and semi-structured data in order to support a semi-automatic, cooperative ontology engineering process. Our ontology learning framework proceeds through ont...

متن کامل

Object-Based Classification of UltraCamD Imagery for Identification of Tree Species in the Mixed Planted Forest

This study is a contribution to assess the high resolution digital aerial imagery for semi-automatic analysis of tree species identification. To maximize the benefit of such data, the object-based classification was conducted in a mixed forest plantation. Two subsets of an UltraCam D image were geometrically corrected using aero-triangulation method. Some appropriate transformations were perfor...

متن کامل

Model Checking with Abstraction Refinement for Well-Structured Systems Master Thesis

Abstraction plays an important role in the verification of infinite-state systems. One of the most promising and popular abstraction techniques is predicate abstraction. The right abstraction, i.e. the one that is sufficiently precise to prove or disprove the property under consideration, is automatically constructed by iterative abstraction refinement. The abstract-check-refine loop is not gua...

متن کامل

Multi-view Exploratory Learning for AKBC Problems

In this paper, we argue that many Automatic Knowledge Base Construction (AKBC) tasks which have previously been addressed separately can be viewed as instances of single abstract problem: multiview semi-supervised learning with an incomplete class hierarchy. We also present a general EM framework for solving this abstract task, and summarize past work on various special cases of multiview semi-...

متن کامل

The Rufus System: Information Organization for Semi-Structured Data

While database systems provide good function for writing applications on structured data, computer system users are inundated wit.11 a flood of semistructured information, such as documents, electronic mail, programs, and images. Today, this information is typically stored in filesystems that provide limited support for organizing, searching, and operating upon this data. Current database syste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001